Skip to content

Conversation

lengmo1996
Copy link
Contributor

@lengmo1996 lengmo1996 commented Jul 14, 2025

What does this PR do?

Fix infinite recursive call issue when loading pre-trained weights of Transformer2DModel with some norm_type parameters

Fixes # (issue)
PR #7647 attempts to map Transformer2DModel to two different variants based on norm_type: {PixArt, DiT}-Transformer2DModel. However, some models may use other norm_type parameters, such as ada_norm or layer_norm. At this time, if you try to load a pre-trained model of a pipeline composed of models using such norm_type parameters, the program will enter an endless recursive call. Specifically, when the from_pretrained function of the pipeline is called, the from_pretrained function of Transformer2DModel will be called. Since Transformer2DModel inherits the LegacyModelMixin class, the from_pretrained function of the LegacyModelMixin class will be called. The from_pretrained function of the LegacyModelMixin class determines whether the Transformer2DModel needs to be mapped to a variant through the _fetch_remapped_cls_from_config function. When Transformer2DModel does not need to be mapped, it will fall into an endless recursive call: Transformer2DModel.from_pretrained → LegacyModelMixin.from_pretrained → _fetch_remapped_cls_from_config → Transformer2DModel.from_pretrained

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu
Copy link
Collaborator

thanks @lengmo1996
would you be able to provide an example that will trigger the infinite loop (can create a dummy model)? it will easier for us to understand the problem

@lengmo1996
Copy link
Contributor Author

thanks @lengmo1996 would you be able to provide an example that will trigger the infinite loop (can create a dummy model)? it will easier for us to understand the problem

Thanks for your quick reply @yiyixuxu . Here is a simple example.
If I have a Transformer2DModel configured as follows:

{
  "_class_name": "Transformer2DModel",
  "_diffusers_version": "0.34.0",
  "activation_fn": "geglu-approximate",
  "attention_bias": true,
  "attention_head_dim": 88,
  "cross_attention_dim": 512,
  "dropout": 0.0,
  "in_channels": null,
  "norm_num_groups": 32,
  "num_attention_heads": 16,
  "num_embeds_ada_norm": 100,
  "num_layers": 36,
  "num_vector_embeds": 4097,
  "sample_size": 32,
  "norm_type": "ada_norm"
}

config.json
Please note that the parameter of norm_type is neither "ada_norm_zero" nor "ada_norm_single" at this time.
when I run the following python code I get an error: RecursionError: maximum recursion depth exceeded

from diffusers import Transformer2DModel

transformer = Transformer2DModel.from_pretrained("./simple_demo")

(I placed config.json in the simple_demo folder)

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@yiyixuxu yiyixuxu requested a review from DN6 July 14, 2025 22:35
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@lengmo1996
Copy link
Contributor Author

I pushed a commit to make the changed files conform to the ruff format requirements. It should pass the job of check_code_quality.

Copy link
Collaborator

@DN6 DN6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching!

@lengmo1996
Copy link
Contributor Author

I have checked the details of failing check about "LoRA tests with PEFT main". It seems that the failure is not related to this modification. This modification involves mapping Transformer2DModel to two variants, but pipeline_components is configured as UNet in the test that reports the error. I also tried to pull the main branch and test it locally with the help of the diffusers/diffusers-pytorch-cpu:latest image, and it seems that the code before the modification will also cause the test to fail. Due to my limited knowledge in code testing, I am not sure how to make further modifications so that this PR can pass all tests perfectly. What else can I do?

@yiyixuxu yiyixuxu merged commit c5d6e0b into huggingface:main Jul 16, 2025
26 of 28 checks passed
tolgacangoz pushed a commit to tolgacangoz/diffusers that referenced this pull request Jul 17, 2025
…when loading certain pipelines containing Transformer2DModel (huggingface#11923)

* fix a bug about loop call

* fix a bug about loop call

* ruff format

---------

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
tolgacangoz pushed a commit to tolgacangoz/diffusers that referenced this pull request Jul 18, 2025
…when loading certain pipelines containing Transformer2DModel (huggingface#11923)

* fix a bug about loop call

* fix a bug about loop call

* ruff format

---------

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants